变形金刚在序列建模及以后取得了显着的成功,但相对于输入序列的长度,二次计算和记忆复杂性遭受了损失。利用技术包括稀疏和线性的注意力和哈希技巧;已经提出了有效的变压器来降低变压器的二次复杂性,但会显着降低准确性。作为响应,我们首先将计算注意图的线性注意力和残差连接解释为梯度下降步骤。然后,我们将动量引入这些组件,并提出\ emph {动量变压器},该动量利用动量来提高线性变压器的精度,同时保持线性内存和计算复杂性。此外,我们制定了一种自适应策略,以根据二次优化的最佳动量计算模型的动量值。这种自适应动量消除了寻找最佳动量值的需求,并进一步增强了动量变压器的性能。包括图像生成和机器翻译在内的自回归和非自动回归任务的一系列实验表明,动量变压器在训练效率和准确性方面优于流行的线性变压器。
物理知识的神经网络(PINN)将问题领域的物理知识作为对损失函数的软限制,但最近的工作表明这可能导致优化困难。在这里,我们研究了搭配点的位置对这些模型训练性的影响。我们发现,随着训练的进行,可以通过适应搭配点的位置来显着提高香草·皮恩的性能。具体而言,我们提出了一种新型的自适应搭配方案,该方案逐渐将更多的搭配点(不增加数量)分配给模型正在造成更高误差的区域(基于域中损失函数的梯度)。加上在任何优化失速过程中对训练的明智重新启动(通过简单地重新采样搭配点以调整损失景观)会导致预测错误的更好估计。我们提出了一些问题的结果,包括具有不同强迫函数的2D泊松和扩散 - 辅助系统。我们发现,针对这些问题的训练香草PINN可以导致解决方案中的预测误差高达70%,尤其是在低搭配点的状态下。相比之下,我们的自适应方案可以达到较小误差的顺序,其计算复杂性与基线相似。此外,我们发现自适应方法始终如一地执行PAR或比香草Pinn方法稍好,即使对于大型搭配点方案也是如此。所有实验的代码都是开源的。
多保真建模和学习在与物理模拟相关的应用中很重要。它可以利用低保真性和高保真示例进行培训,以降低数据生成成本,同时仍然达到良好的性能。尽管现有方法仅模型有限,离散的保真度,但实际上,忠诚度的选择通常是连续且无限的,这可以对应于连续的网格间距或有限元元素长度。在本文中,我们提出了无限的保真度核心化(IFC)。鉴于数据,我们的方法可以在连续无限的保真度中提取和利用丰富的信息来增强预测准确性。我们的模型可以插值和/或推断出对新型保真度的预测,甚至可以高于训练数据的保​​真度。具体而言,我们引入了一个低维的潜在输出作为保真度和输入的连续函数,并具有带有基矩阵的多个IT以预测高维解决方案输出。我们将潜在输出建模为神经普通微分方程(ODE),以捕获内部的复杂关系并在整个连续保真度中整合信息。然后,我们使用高斯工艺或其他颂歌来估计忠诚度变化的碱基。为了有效的推断,我们将碱基重组为张量,并使用张量 - 高斯变异后部为大规模输出开发可扩展的推理算法。我们在计算物理学的几个基准任务中展示了我们的方法的优势。
物理建模对于许多现代科学和工程应用至关重要。从数据科学或机器学习的角度来看,更多的域 - 不可吻合,数据驱动的模型是普遍的,物理知识 - 通常表示为微分方程 - 很有价值,因为它与数据是互补的,并且可能有可能帮助克服问题例如数据稀疏性,噪音和不准确性。在这项工作中,我们提出了一个简单但功能强大且通用的框架 - 自动构建物理学,可以将各种微分方程集成到高斯流程(GPS)中,以增强预测准确性和不确定性量化。这些方程可以是线性或非线性,空间,时间或时空,与未知的源术语完全或不完整,等等。基于内核分化,我们在示例目标函数,方程相关的衍生物和潜在源函数之前构建了GP,这些函数全部来自多元高斯分布。采样值被馈送到两个可能性:一个以适合观测值,另一个符合方程式。我们使用美白方法来逃避采样函数值和内核参数之间的强依赖性,并开发出一种随机变分学习算法。在模拟和几个现实世界应用中,即使使用粗糙的,不完整的方程式,自动元素都显示出对香草GPS的改进。
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP (age 17), and up to 40% of the time for Stable Diffusion (ages 14 and 18); the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics
Common to all different kinds of recurrent neural networks (RNNs) is the intention to model relations between data points through time. When there is no immediate relationship between subsequent data points (like when the data points are generated at random, e.g.), we show that RNNs are still able to remember a few data points back into the sequence by memorizing them by heart using standard backpropagation. However, we also show that for classical RNNs, LSTM and GRU networks the distance of data points between recurrent calls that can be reproduced this way is highly limited (compared to even a loose connection between data points) and subject to various constraints imposed by the type and size of the RNN in question. This implies the existence of a hard limit (way below the information-theoretic one) for the distance between related data points within which RNNs are still able to recognize said relation.
